Skip to content

feat: dot notation for indexes create and databases load#92

Merged
eddietejeda merged 2 commits into
mainfrom
eddietejeda-dot-notation
May 20, 2026
Merged

feat: dot notation for indexes create and databases load#92
eddietejeda merged 2 commits into
mainfrom
eddietejeda-dot-notation

Conversation

@eddietejeda
Copy link
Copy Markdown
Contributor

@eddietejeda eddietejeda commented May 20, 2026

Summary

  • hotdata indexes create airbnb.listings[description] --type bm25 — replaces --connection-id / --schema / --table / --columns flags with a single positional arg using bracket notation for columns
  • hotdata databases load airbnb.listings --url <url> — new top-level shorthand replacing hotdata databases tables load airbnb listings --url ...
  • --name on indexes create is now optional; auto-derived as {table}_{cols}_{type} when omitted
  • Schema defaults to public in both commands when only db.table is given (no schema segment)
  • Add connections::resolve_connection_id(api, name_or_id) to look up a connection by name or raw ID
  • Remove the load: ... hint from databases create success output

Examples

# Create a BM25 index — name auto-derived as "listings_description_bm25"
hotdata indexes create airbnb.listings[description] --type bm25

# Load a parquet file (shorthand)
hotdata databases load airbnb.listings --url https://www.hotdata.dev/data/sf-airbnb-listings.parquet

Test plan

  • hotdata indexes create airbnb.listings[description] --type bm25 resolves connection by name and creates index
  • hotdata indexes create airbnb.public.listings[name,description] --type bm25 --name my_idx uses explicit name and schema
  • hotdata indexes create --dataset-id ds123 --columns col --type bm25 still works for dataset scope
  • hotdata databases load airbnb.listings --url <url> loads the table
  • hotdata databases load airbnb.public.listings --file ./data.parquet uses explicit schema
  • hotdata databases create no longer prints the load: hint

🤖 Generated with Claude Code

… notation for databases load

- `hotdata indexes create airbnb.listings[col1,col2] --type bm25`
  replaces verbose --connection-id/--schema/--table/--columns flags
- `hotdata databases load airbnb.listings --url ...`
  replaces `hotdata databases tables load airbnb listings --url ...`
- `--name` on indexes create is now optional (auto-derived from table+cols+type)
- Add `connections::resolve_connection_id` to look up connection by name or ID
- Remove load hint from `databases create` success output

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sentry
Copy link
Copy Markdown

sentry Bot commented May 20, 2026

Codecov Report

❌ Patch coverage is 41.95804% with 83 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/main.rs 47.61% 66 Missing ⚠️
src/connections.rs 0.00% 17 Missing ⚠️

📢 Thoughts on this report? Let us know!

Comment thread src/main.rs Outdated
);
std::process::exit(1);
});
let auto = format!("dataset_{type}", type = r#type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: auto-name for the dataset branch is dataset_{type}, which collides for every additional index on the same dataset (e.g. creating BM25 indexes over two different columns both auto-name to dataset_bm25 and the second create will fail). The columns are already captured in cols here — mirroring the connection branch with something like format!("dataset_{cols}_{type}", cols = cols.replace(',', "_")) would make the default useful instead of guaranteed-collision. (not blocking)

Comment thread src/main.rs Outdated
Comment on lines +987 to +988
let table_part = &target[..bracket_pos];
let cols_raw = target[bracket_pos + 1..].trim_end_matches(']');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this silently accepts unclosed/garbled brackets. airbnb.listings[col (missing ]) parses successfully as cols_raw = "col" because trim_end_matches(']') does nothing when there's no ]. Likewise airbnb.listings[col]extra parses as a single column "col]extra" and is sent to the server. Worth requiring target.ends_with(']') (or checking that bracket_pos is followed by a matching ] at the end) and rejecting otherwise, so users get a clear local error rather than a confusing API error. (not blocking)

Comment thread src/main.rs
Comment on lines +963 to +1013
fn parse_db_target(target: &str) -> (String, String, String) {
let parts: Vec<&str> = target.splitn(4, '.').collect();
match parts.as_slice() {
[db, tbl] => (db.to_string(), "public".to_string(), tbl.to_string()),
[db, schema, tbl] => (db.to_string(), schema.to_string(), tbl.to_string()),
_ => {
eprintln!(
"error: target must be 'database.table' or 'database.schema.table'"
);
std::process::exit(1);
}
}
}

/// Parse an index target like `airbnb.listings[col1,col2]` or
/// `airbnb.public.listings[col1,col2]` into `(conn_name, schema, table, columns)`.
/// Schema defaults to `public` when only two dot-parts are given.
fn parse_index_target(target: &str) -> (String, String, String, Vec<String>) {
let Some(bracket_pos) = target.find('[') else {
eprintln!(
"error: target must include columns in brackets, e.g. airbnb.listings[col1,col2]"
);
std::process::exit(1);
};
let table_part = &target[..bracket_pos];
let cols_raw = target[bracket_pos + 1..].trim_end_matches(']');

let parts: Vec<&str> = table_part.splitn(4, '.').collect();
let (conn, schema, table) = match parts.as_slice() {
[c, t] => (c.to_string(), "public".to_string(), t.to_string()),
[c, s, t] => (c.to_string(), s.to_string(), t.to_string()),
_ => {
eprintln!(
"error: target must be 'connection.table[cols]' or 'connection.schema.table[cols]'"
);
std::process::exit(1);
}
};

let columns: Vec<String> = cols_raw
.split(',')
.map(|s| s.trim().to_string())
.filter(|s| !s.is_empty())
.collect();

if columns.is_empty() {
eprintln!("error: no columns specified in brackets");
std::process::exit(1);
}

(conn, schema, table, columns)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these two parsers (parse_db_target, parse_index_target) are pure, have several edge cases (2- vs 3-segment, default-public, bracket placement, empty cols, malformed dot counts), and are easy to regress. A small #[cfg(test)] mod tests covering happy paths + each error branch would lock the behavior down — especially since the existing databases_cli.rs tests don't exercise the new shorthand. (not blocking)

Comment thread src/command.rs
Comment on lines +331 to +334
/// Table and columns to index: `connection.table[col1,col2]`
/// or `connection.schema.table[col1,col2]`. Schema defaults to `public`.
#[arg(conflicts_with = "dataset_id")]
target: Option<String>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: [ and ] are glob metacharacters in zsh (and in bash with extglob/failglob), so hotdata indexes create airbnb.listings[description] --type bm25 will fail with no matches found for many users out of the box. Worth a line in the doc comment telling users to quote the target ('airbnb.listings[description]'). Same note would help on DatabasesCommands::Load even though it has no brackets, since the README example uses brackets in nearby commands. (not blocking)

Comment thread src/connections.rs
Comment on lines +161 to +178
pub fn resolve_connection_id(api: &ApiClient, name_or_id: &str) -> String {
let body: ListResponse = api.get("/connections");
match body
.connections
.iter()
.find(|c| c.id == name_or_id || c.name == name_or_id)
{
Some(conn) => conn.id.clone(),
None => {
use crossterm::style::Stylize;
eprintln!(
"{}",
format!("error: no connection named or with id '{name_or_id}'").red()
);
std::process::exit(1);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: this lists every connection in the workspace even when the caller already has a raw ID. For workspaces with many connections (or once /connections paginates) this adds a needless round trip and could miss a valid ID past the first page. A cheap optimization: if name_or_id matches a known ID shape (e.g. starts with a known prefix or looks like a UUID), try GET /connections/{id} first and fall back to listing on 404. Fine to defer — the existing connections::list has the same limitation. (not blocking)

claude[bot]
claude Bot previously approved these changes May 20, 2026
- Dataset auto-name now includes columns (dataset_{cols}_{type}) to avoid collision
- Validate bracket closure in parse_index_target; reject unclosed/garbled brackets
- Add quoting note to indexes create doc comment for shell glob safety
- resolve_connection_id tries GET /connections/{id} directly for conn* prefixed IDs
- Add unit tests for parse_db_target and parse_index_target (6 tests)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment thread src/command.rs
/// Quote the argument to prevent shell glob expansion:
/// `hotdata indexes create 'airbnb.listings[description]' --type bm25`
#[arg(conflicts_with = "dataset_id")]
target: Option<String>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: README.md is now out of sync with this command — lines 252–261 still document hotdata indexes create --connection-id <id> --schema <schema> --table <table> --name <name> --columns <cols> ..., but this PR removes those flags from Create. A user copying that example will hit a clap error. Worth updating the README block to show the new connection.table[cols] form (and the dataset-scope path that now requires --columns), plus a one-line mention of the hotdata databases load shorthand alongside tables load. (not blocking)

@eddietejeda eddietejeda merged commit 6eb5854 into main May 20, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant